Create a profile plot of unaffected nail length vs. time for 50 group A subjects (you could use 50 randomly selected group A subjects, or first 50 group A subjects, or any 50 group A subjects of your choice) and based on the plot, comment on:
Import the toenail data (NB “toe.xlsx” is Toenail.xlsx with the first 8 rows deleted)
toe <- read_excel("toe.xlsx")
We will widen the data with the pivot_wider function, as the gather function has been superseded and is no longer developed:
# creates a wide dataset:
toeW <- pivot_wider(toe, id_cols = c(id,treat), names_from = time, values_from = response)
# create a wide dataset with missing data omitted:
toeWnarm <- toeW %>% drop_na()
# convert missing omitted data to long:
toeLnarm <- pivot_longer(toeWnarm, cols = c("0","1","2","3","6","9","12"), names_to = "time", values_to = "response")
# convert time values back to numeric:
toeLnarm$time <- as.numeric(toeLnarm$time)
first we will select the group A subjects:
toe.A <- toe %>% filter(toe$treat == 1)
then we can convert this data to wide format and drop all rows with missing data:
toe.A.wide <- pivot_wider(toe.A, id_cols = id, names_from = time, values_from = response) %>% drop_na()
Select first 50 columns:
toeA50Wnarm <- toe.A.wide[1:50,]
Re-long-ify the data:
toeA50Lnarm <- pivot_longer(toeA50Wnarm, cols = c("0","1","2","3","6","9","12"), names_to = "time", values_to = "response")
toeA50Lnarm$time <- as.numeric(as.character(toeA50Lnarm$time))
Creating the Plot:
ggplot(data = toeA50Lnarm,
mapping = aes(x = time, y = response, group = id)) +
geom_line() + geom_point()
# using the whole dataset:
#
# ggplot(data = toe[which(toe$treat==1),],
# mapping = aes(x = time, y = response, group = id)) +
# geom_line() + geom_point()
comment on:
Create a profile plot of mean unaffected nail length vs. time for group A and Group B. Based on the plot, comment on:
Creating the plot (note that itraconazole appears on the left, and terbinafine on the right):
ggplot(data = toe,
mapping = aes(x = time, y = response, group = id)) +
geom_line() + geom_point() +
facet_grid(. ~ treat)
Based on the plot, comment on:
Create a scatter plot of unaffected nail length vs. time and add a lowess curve for group A and group B, separately and comment on the trend of mean unaffected nail length over time for each group.
Creating the plot with lowess curve (note that itraconazole appears on the left, and terbinafine on the right):
ggplot(data = toe,
mapping = aes(x = time, y = response, group = id)) +
geom_line() + geom_point() +
facet_grid(. ~ treat) +
stat_smooth(aes(group = 1))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Comment: again it is not clear if there is any difference in mean unaffected nail length between these two treatments.
Create a scatter plot matrix of the repeated measurements over time and comment on how the correlation among the repeated measurements changes over time.
# first we can make a wide dataset with non-numeric column names:
toeWt <- pivot_wider(toe, id_cols = c(id,treat), names_from = time, values_from = response, names_prefix = "t")
# then we can create the scatterplot matrix with the pairs function:
pairs(~ t0 + t1 + t2 + t3 + t6 + t9 + t12, data = toeWt)
As expected from the examples given in class, and as we would expect intuitively, correlation decreases among the given measures over time.
Fit an appropriate model to answer the following questions and interpret your results. Write out the model and specify the null vs. alternative hypotheses for each question.
These questions are most appropriately answered by change score analysis, where \(\delta_{i}=y_{i1}-y_{i0}\) is defined as the difference in unaffected nail length from \(t=0\) (\(y_{i0}\)) time \(t=1\) (\(y_{i1}\)).
For questions 1 & 2, the model can be expressed as:
\[\delta_i = \beta_0 + \beta_1x_i + \varepsilon_i\] Where \(x_i\) is the treatment group variable (0 for itraconazole; 1 for terbinafine).
This model yields two null hypotheses for the questions above; for question 1:
\[H_0: \beta_0 = 0\]
For question 2:
\[H_0: \beta_1=0\]
The following code performs the change score analysis:
diff <- toeWt$t1 - toeWt$t0
toe_model <- lm(diff~ toeWt$treat)
summary(toe_model)
##
## Call:
## lm(formula = diff ~ toeWt$treat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8154 -0.8154 0.1176 1.1176 3.1846
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.88239 0.10077 8.757 <2e-16 ***
## toeWt$treat -0.06696 0.14082 -0.475 0.635
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.201 on 289 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.0007817, Adjusted R-squared: -0.002676
## F-statistic: 0.2261 on 1 and 289 DF, p-value: 0.6348
We can therefore reject \(H_0\) for question 1, and fail to reject \(H_0\) for question 2; the change in unaffected nail length from baseline to one month significantly differs from the null, but there is not a significant difference in change from baseline to one mohtn between both treatment gorups.
Fit an appropriate model to answer the following questions and interpret your results. Write out the model and specify the null vs. alternative hypotheses for each question.
Note: Make sure to answer the questions AND interpret your results including providing basis (such as results of hypothesis testing, estimates of difference or change and its 95% CI) for your answers, and attach appropriate output if you want.
These questions are best answered through anslysis of covariance of post- intervention score using pre-intervention score as a covariate; thus an appropriate model for these questions can be expressed as:
\[y_{i1} = \beta_0 + \beta_1 x_i + \beta_2 y_{i0} + \varepsilon_i\]
Where \(x_i\) is the treatment group variable (0 for itraconazole; 1 for terbinafine).
This model yields two null hypotheses for the questions above; for question 1:
\[H_0: \beta_1 = 0\]
For question 2:
\[H_0: \beta_2=0\]
The following code performs the ANCOVA analysis:
toe_ANCOVA <- lm(t1 ~ treat + t0, data = toeWt)
summary(toe_ANCOVA)
##
## Call:
## lm(formula = t1 ~ treat + t0, data = toeWt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2317 -0.9651 -0.0084 0.9916 3.0349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.00843 0.10935 9.222 <2e-16 ***
## treat -0.04329 0.13945 -0.310 0.756
## t0 0.92665 0.02627 35.275 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.187 on 288 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.8125, Adjusted R-squared: 0.8112
## F-statistic: 623.8 on 2 and 288 DF, p-value: < 2.2e-16
We can therefore fail to reject \(H_0\) for question 1, and reject \(H_0\) for question 2; the change in unaffected nail length at 1 month after adjusting for baseline difference does not appear to differ between treatment groups, however after adjusting for treatment group difference unaffected nail length at one month appears related to baseline after adjusting for group differnce.
What is Lord’s paradox? Why could the paradox occur?
Lord’s paradox occurs during the analysis of different groups in a before vs after designed study when the t-test of the difference in before vs after produces a different result than those produced by ANCOVA with adjustment for initial scores.
This apparent paradox can occur when ANCOVA and t-test models have two fundamentally different sets of causal assumptions, and thus address two different questions. Pearl’s conceptualization of the initial measurement as a mediator of the effect of the exposure on the final measurement illustrates this difference.1 If we consider Glymour et al’s example of the effect of educational attainment on cognitive change with cognitive abilities measured at baseline and then again at the end of the experiment,2 we can use Pearl’s mediation framework to conceptualize initial cognitive ability as a mediator of the effect of education on final cognitive ability. This suggests the ANCOVA approach asseses a causal effect of education on final cognitive ability, and adjustment can be made for initial cognitive ability if it is assumed that it is not a mediator; whereas the t-test analysis can assess a causal effect of the exposure on the difference in measurements.
Clark illustrates this framework with the following directed acyclic graph:3
Under what conditions is ANCOVA model appropriate to use? Under what conditions is ANCOVA model NOT appropriate to use?
The above discussion of the origins of Lord’s paradox as improper adjustment for a mediator suggests conditions for apropriate use of ANCOVA vs t-test approaches. If we consider Wright’s given example of supplementary instruction (SI) on arethemetic test attainment,4 then ANCOVA can be used, as when the first measurement cannot be said to mediate the exposure-outcome relation. Wright further notes that “(t test) asks whether the average gain in score is different for the two groups… (ANCOVA) asks whether the average gain, partialling out pre-scores, is different between the two groups”. This subtle distinction illustrates that there remains a slight difference in approach, even when mediation issues have been resolved.